NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Voice Interaction With Conversational AI Could Facilitate Thoughtful Reflection and Substantive Revision in Writing

https://doi.org/10.18653/v1/2025.in2writing-1.7

Kim, Jiho; Laban, Philippe; Chen, Xiang; Arnold, Kenneth C (May 2025, Association for Computational Linguistics)

Writing well requires not only expressing ideas but also refining them through revision, a process facilitated by reflection. Prior research suggests that feedback delivered through dialogues, such as those in writing center tutoring sessions, can help writers reflect more thoughtfully on their work compared to static feedback. Recent advancements in multi-modal large language models (LLMs) now offer new possibilities for supporting interactive and expressive voice-based reflection in writing. In particular, we propose that LLM-generated static feedback can be repurposed as conversation starters, allowing writers to seek clarification, request examples, and ask follow-up questions, thereby fostering deeper reflection on their writing. We argue that voice-based interaction can naturally facilitate this conversational exchange, encouraging writers' engagement with higher-order concerns, facilitating iterative refinement of their reflections, and reduce cognitive load compared to text-based interactions. To investigate these effects, we propose a formative study exploring how text vs. voice input influence writers' reflection and subsequent revisions. Findings from this study will inform the design of intelligent and interactive writing tools, offering insights into how voice-based interactions with LLM-powered conversational agents can support reflection and revision.
more » « less
Free, publicly-accessible full text available May 4, 2026
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents

https://doi.org/10.18653/v1/2024.emnlp-main.499

Tang, Liyan; Laban, Philippe; Durrett, Greg (January 2024, Proceedings of the Conference on Empirical Methods in Natural Language Processing (published by Association for Computational Linguistics))

Full Text Available
Automatic and Human-AI Interactive Text Generation (with a focus on Text Simplification and Revision)

https://doi.org/10.18653/v1/2024.acl-tutorials.2

Dou, Yao; Laban, Philippe; Gardent, Claire; Xu, Wei (January 2024, Association for Computational Linguistics)

Full Text Available
Understanding Factual Errors in Summarization: Errors, Summarizers, Datasets, Error Detectors

https://doi.org/10.18653/v1/2023.acl-long.650

Tang, Liyan; Goyal, Tanya; Fabbri, Alex; Laban, Philippe; Xu, Jiacheng; Yavuz, Semih; Kryscinski, Wojciech; Rousseau, Justin; Durrett, Greg (January 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

The propensity of abstractive summarization models to make factual errors has been studied extensively, including design of metrics to detect factual errors and annotation of errors in current systems’ outputs. However, the ever-evolving nature of summarization systems, metrics, and annotated benchmarks makes factuality evaluation a moving target, and drawing clear comparisons among metrics has become increasingly difficult. In this work, we aggregate factuality error annotations from nine existing datasets and stratify them according to the underlying summarization model. We compare performance of state-of-the-art factuality metrics, including recent ChatGPT-based metrics, on this stratified benchmark and show that their performance varies significantly across different types of summarization models. Critically, our analysis shows that much of the recent improvement in the factuality detection space has been on summaries from older (pre-Transformer) models instead of more relevant recent summarization models. We further perform a finer-grained analysis per error-type and find similar performance variance across error types for different factuality metrics. Our results show that no one metric is superior in all settings or for all error types, and we provide recommendations for best practices given these insights.
more » « less
Full Text Available

Search for: All records